Intelligent colocation of HPC workloads

نویسندگان

چکیده

Many HPC applications suffer from a bottleneck in the shared caches, instruction execution units, I/O or memory bandwidth, even though remaining resources may be underutilized. It is hard for developers and runtime systems to ensure that all critical are fully exploited by single application, so an attractive technique increasing system utilization colocate multiple on same server. When share resources, however, contention lead reduced application performance. In this paper, we show server efficiency can improved first modeling expected performance degradation of colocated based measured hardware counters, then exploiting model determine optimized mix applications. This paper presents new intelligent resource manager makes following contributions: (1) machine learning predict counters (2) scheduling scheme deployed existing enable co-scheduling with minimum degradation. Our results our approach achieves improvements 7 % (avg) 12 (max) compared standard policy commonly used job managers. • colocating jobs machine. Jobs contending experience Hardware characterize resource. Greedy strategy improve batch systems.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Case For Colocation of HPC Workloads

The current state of practice in supercomputer resource allocation places jobs from different users on disjoint nodes both in terms of time and space. While this approach largely guarantees that jobs from different users do not degrade one another’s performance, it does so at high cost to system throughput and energy efficiency. This focused study presents job striping, a technique that signifi...

متن کامل

Energy-Efficiency Evaluation of Intel KNL for HPC Workloads

Energy consumption is increasingly becoming a limiting factor to the design of faster large-scale parallel systems, and development of energy-efficient and energy-aware applications is today a relevant issue for HPC code-developer communities. In this work we focus on energy performance of the Knights Landing (KNL) Xeon Phi, the latest many-core architecture processor introduced by Intel for th...

متن کامل

Multiple objective scheduling of HPC workloads through dynamic prioritization

We have developed an efficient single queue scheduling system that utilizes a greedy knapsack algorithm with dynamic job priorities. Our scheduler satisfies high level objectives while maintaining high utilization of the HPC system or collection of distributed resources such as a computational GRID. We provide simulation analysis of our approach in contrast with various scheduling strategies of...

متن کامل

Phase Recognition from Power Traces of HPC Workloads

Prior work has shown that power consumption traces of HPC workloads exhibit distinctive statistical characteristics, which allows the workload that generated a given power trace to be inferred with high accuracy. However, these power signatures apply to the entire power trace, with no ability to break it down further into phases or to recognize novel combinations of known workloads. In this wor...

متن کامل

SIMULATION OF HPC JOB SCHEDULING AND LARGE - SCALE PARALLEL WORKLOADS Mohammad

The paper presents a simulator designed specifically for evaluating job scheduling algorithms on large-scale HPC systems. The simulator was developed based on the Performance Prediction Toolkit (PPT), which is a parallel discrete-event simulator written in Python for rapid assessment and performance prediction of large-scale scientific applications on supercomputers. The proposed job scheduler ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Parallel and Distributed Computing

سال: 2021

ISSN: ['1096-0848', '0743-7315']

DOI: https://doi.org/10.1016/j.jpdc.2021.02.010